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ABSTRACT 

An  R x N matrix  is  generated  in  the  following  way.  In  each  row 
a predetermined  number  of  positions  are  randomly  assigned  the  value  1; 
the  remaining  positions  are  assigned  the  value  0.  For  each  column  a real 
valued  function  of  the  elements  is  given.  In  this  paper  the  sum  of  the 
values  of  these  functions  is  studied  when  N — *.  The  results  can  be 
applied  to  e.g.  ’’committee"  problems  and  contingency  tables  of  O-i  -variables 
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SOME  MATRIX  OCCUPANCY  PROBLEMS  WITH  DICHOTOMOUS  ENTRIES 


Lars  Holst 


1 . Introduction 

In  Feller  (1968),  p.  112,  exercise  16,  the  following  problem  is  posed: 
"A  cell  contains  N chromosomes,  between  any  two  of  which  an  inter- 
change of  parts  may  occur.  If  R interchanges  occur  (which  can  happen 
in  J J distinct  ways),  find  the  probability  that  exactly  m chromosomes 
will  be  involved". 

Various  generalizations  of  this  problem  have  been  considered  as 
e.g.  committee  problems.  A typical  generalization  is:  "R  committees  are 
formed  from  N individuals,  the  j:th  committee  has  size  n^,  the 
committees  are  formed  by  independent  simple  random  sampling,  find  the 
distribution  of  the  number  of  individuals  belonging  to  all  committees". 

This  type  of  problem  has  also  been  applied  to  a certain  health  problem, 
see  Mantel  (1974)  and  the  references  given  there. 

Let  us  consider  the  following  situation.  Consider  a finite  population 
of  N units.  Take  R independent  simple  random  samples  without 
replacement  of  sizes  n^,...,n^.  Define  Y.^  =■  1,  if  the  unit 

occur  in  the  sample,  otherwise  let  Y ^ = 0 for  j = 1,  . . . , R and 

k - 1,  . . . , N.  Consider  the  RxN  matrix  or  special  contingency  table 
with  the  Y's  as  entries.  This  matrix  has  fixed  row  totals  n^,  . . . , n^ 
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2.  The  characteristic  function 

Let  X's  and  Y's  be  deftned  as  above  and  consider  for  a given 
function  f the  random  variables 

Z ° f<Yl  | yrn) 


u-  «x„ w- 


Theorem  j.  The  following  relation  holds 


E(eltZ)  = aXT  • TT  (Np  (1  - p)/2n)1 


N "j'*  y 


n 

/ E(exp(itU  + i 2,  - Np^HdOj 


where 

R KI  n N-n  , -1 

aN  = TT  ((^  )Pj  (1  - P^)  • (ZwNpjd  - P}))  ) . 

Proof.  The  conditional  distribution  of  (X^, . . . , X.^)  given  = n.  IS 

the  same  as  the  distribution  of  (Y Y)N>.  Therefore  we  have 

E(eltZ)  = E(eltU|x^  = n.,  j = 1 R)  . 

The  distribution  is  given  by 


P( Yj i = wji*  • • •’  YjN  = VjN}  = l/  n 


for  v.  = 0,1  and  v - v ..  = n . Furthermore,  we  have 
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X • • • \ exp(itf(v))  • (2tt)R  TT 
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P(X.t  = n.)/ 


N 
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= E(eltZ)  • (2tt)R  TT  P(X  - n ) , 
J 1 1 


from  which  tne  assertion  follows. 


Using  Stirling's  formula  we  get: 

Lemma.  If  Np^(l  - p^)  — all  ),  then  -*  1.  If  p^  -* 
all  ),  then  = 1 + 0(1/N). 
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3.  Limit  theorems 


In  this  section  we  consider  random  variables  of  the  form 

ZN  = l VV V • 

k = 1 

As  (Xlk,  . . . ,XRk)  and  (Ylk,  . . . , YRk>  have  the  same  distribution,  they 
also  have  the  same  moments  so 

E(fk(Xlk’  ' ",XRk^  = E(fk(Ylk’ ' * YRkJ)  = pk  * 

Theorem  l.  Suppose  that  there  exists  a qQ  < 1 such  that  for  q > qQ, 

M = [Nqj  and 
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UM  = ^ ’ XRk^ 


k = 1 


k'  lk’ 


Rk' 


we  have 
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q lq  2q 
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k - 1 


i 1 
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B.q  q ° 


bd  0 0 

Rq 
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^ (XRk  ‘ pR)/  (NpR(l  ~ PR^ 
k = l 


Rq 

0 


when 


* A * A.,  B.  -*  B.,  all  j,  when  q — 1.  Then  when  N — 
q i jq  j’ 


N R 

£»Zn  - X kJ  -N(0  A - V B ) . 
” k-1  K 1 j=l  ‘ 
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Proof.  Without  loss  of  generality  we  can  suppose  that  ^ - 0.  Define  Z M 


1V1 

analogously  to  UM>  Set  Xj>M  = ^ Xjk  and  Xj 


x,  = X X..  = X -X  ... 

M k - M+l  )k  >'  J<M 


Let  us  also  introduce  <r.  = ( Np.(l  - p^))  ' . From  Theorem  1 it  follows  that 

E(e  M)  = °N  • <2it)"R/2  • a!  • ' • • ‘ aR  • 

w n R 


/ / E(exp(itUMtl  X »,(*,. 


-tt  -ir 
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3 “N  • (*0'R/2  I • • • / E(exp(itUM  + i X ^(Xj.  M ‘ Mpj,AJ)) 

-tra . -ira  _ 1 


W<T1  -W<rR 


• E(exp(i  £ m ~ (N  ~ M)p.)Aj))d^  • • • di|/R  = 

= aN(2tr)  R/2  / • • • / 9M(t,  •Jy  • • • * ^R^M^l’  " * » ^R^l  ” ‘ d*R  ' 
From  DeMotvre's  theorem  we  have 

1 R 2 

hM^i*  • • • . 4-r)  - exp(-  £ X (1  “ q))  = h(4V  * * ’ • V • 
and  it  is  not  difficult  to  prove  that 


f lhiJ  - /h  • 


From  the  assumptions  we  get 


"mM V*eXP('  2(t2Aq  + 2t^  *^JBjq  + ^ 
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As  ig^i  < 1 and  -*  1,  it  follows  from  the  extended  Lebesgue 
Convergence  Theorem,  see  Rao  (197  3),  p.  136,  that 

r,  uZm\  ,,  rR/2 

E(e  ) -*  (2ir) 

12  R R 2 

• J ■ • • / exp(-  j (t  Aq  + 2t  ^ + V ^))d^  • • • d*R 


exp(-  B^)) 


Thus  we  have  proved  that 


X(Z.t)  - N(0,  A - V B ) . 

M'  ’ q Y jq 


Analogously  we  prove 


"vy-^vvi  'W2' 


The  assumptions  imply  that 


A - S B2  - A,  - X B2 
q y JQ  1 1 1 


v v?  (v  v2*0 


Using  the  argument  by  LeCam  (1958),  p.  13-14,  we  obtain 


R 

T(ZN)  -mo.Aj-X  ) 


t 


When  the  . erne  function  f is  used  for  each  column  then  wegrta 
simple  and  useful  theorem. 

Theorem  l.  Suppose  that  for  some  random  vector  (U,  V , . . . , VR)  and 
o (Np.(l  - p^))1^  we  have 


Then  the  infinitely  divisible  random  vectoi  has  the  characteristic  function 
R 

itU+^s.V.  r R 2 

E(e  1 ) = H(t,  s)  = 4>(t)  • exp(-  j (t^A  + 2 ^ ts.B.  + £ Sj  ^ * 


where 

-Jt2A 

E(eltU)  = 4>(t)  • e 

and  <?  has  no  normal  component.  Furthermore 

X(Z  - Np)  - £(Z) 

where  the  random  variable  Z has  the  characteristic  function 

R , 

E(eUZ)  = «Kt)  • exp(-  j t (A  - X B ))  . 

1 1 

Proof.  The  first  part  of  the  assertion  follows  from  classical  limit  theorems, 
cf.  LeCam  (1958),  P-  8. 

Without  loss  of  generality  let  us  assume  that  p = 0.  We  observe 
that  with  M Nq  , 0 < q < 1, 
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E(exp(it  UM  + i ^ 


s (X 
j j- 
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Mp.)/a .))  - (H(t,  s))q  , 


using  the  notation  of  the  previous  proof.  The  assertion  then  follows  as 
in  Theorem  2.  • 

Remark  1.  If  p^  -*  0 < y,  < 1,  all  j,  then  the  limit  distribution  has 

no  non-normal  component,  because  f^(X  . . . , X ) can  only  take  at 

most  2 different  values.  Hence  non-normal  limits  can  only  occur  when 

some  p -*  0 or  1. 
j 

Remark  2.  Many  of  the  theorems  in  Eicker  et  al.  (1972)  are  special  cases 
of  Theorem  3. 

The  case  with  no  normal  component  is  particularly  simple. 

Theorem  4.  Suppose  that 

£(Un  - Np)  -£(U)  , 


where  U has  no  normal  component.  Then 

£(Zn  - Np)  -£(U)  . 

Proof.  As  - Np  and  (X^  - Np^)/o\,  j = 1,  . . . , R,  converges  in 

distribution  it  follows  that  we  can  from  every  subsequence  of  the  vectors 

in  Theorem  3 select  a convergent  subsequence.  For  such  a convergent 

subsequence  we  can  apply  Theorem  3.  As  U had  no  normal  component 

we  have  A = 0 and  so  B = • ••  - B - 0.  Thus  the  limiting  characteristic 

1 K 

function  is  just  <j>(t).  As  this  limit  does  not  depend  on  the  particular 
subsequence  it  follows  that 

E(exp(it(UN  - Np)))  - cHt)  , 
which  proves  the  theorem.  m 
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If  we  consider  sequences  such  that  f.  - • • • = f.  = f and  p = v , 

I N ) ) 

0 < < 1,  independent  of  N,  then  the  following  local  limit  theorem  hold. 

Theorem  5.  Suppose  that  the  random  variable  f(Y  , . ..,Y  ) is  integer 

11  K1 

one-lattice.  Then  uniformly  in  v when  N -►  «> 

N 

P(  i f(Y]k Y^)  = v)  • (Nct2(1  - p2))1/2  • (2*)1/2 

k = l 

- exp(-(v  - Nn)2/2N.r2(l  - p2))  - 0 , 

where 

*=  E,(yn'"  'YR1>  ' 

02  = Var  «YU Y^) 

and 

, R ,, 

p = £ (Cov(f(Yu YR1),  Y^))  A YjU  - Yj>  • 

Proof.  Using  Theorem  1 and  the  inversion  formula  for  characteristic 
functions  we  obtain 


N 

P<  Yj  f<Y 

k - 1 


Ik’ 


’ YRkJ 


■ ) = a 


N 


<2*f<R*2>/2  ■ 


ir  w R 

• / ...  / E(exp(ttUN+i  Y e<(xj.  ~ Np  )))e"ltvdtd0  d0R 

-W  -IT  J = 1 

* aN  ' °1  * * * ffR  ’ <2ff)R/2  ’ p(uN  = v*  Xj.  = njf  j = 1,  . . .,  R)  = 

= «N  ‘ P(vtN,n  , . • • , nR)  . 

Using  a multidimensional  local  limit  theorem  by  Rvaceva  (1954),  p.  202, 
we  have  uniformly  in  v 
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4.  Applications 


Example  1.  Suppose  that  the  number  of  columns  with  no  zeros  are  of 
interest.  In  the  committee  problem  this  corresponds  to  the  number  of 
persons  which  are  members  of  all  committees.  The  appropriate  function 

for  this  case  is 

[(Y1 v*  v •••  • V 

Suppose  now  that  N,  n , . . . , n — so  that  Np  • ...  • p -*  V,  0 < \ < «. 

IK  IK 

We  have 

p(f(Xir  • • • ’V  = = pi  ' • • • ’ pr 

so  from  the  usual  Poisson  approximation  of  the  binomial  it  follows  that 

N 

£<  L •••  , xRk))  - Po(\)  . 

k = l 

Hence  by  Theorem  4 the  number  of  columns  with  no  zero  is  in  the  limit 
Po(\)  - distributed. 

Example  2..  In  connection  with  a health  research  problem  Mantel  (1974) 
proposes  tests  for  differences  between  columns.  Essentially  using  the 
sample  variance  of  the  column  totals  is  suggested  as  a test-statistic.  In 
our  notation  Mantel's  statistic  could  be  written 

N R R 

ZN  = (N  - 1)(  Z ( £ (Y..  - P.))  )/N  % p (1  - p ) . 

k=l  j=l  ’ 1 1 1 1 

Using  normal  random  variables  Mantel  approximates  Z.'s  distribu- 

N 

tion  by  - 1).  A limit  distribution  of  can  be  obtained  from 
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Theorem  3 if  n /N  -*  v , 0 < v < 1.  The  limit  law  has  no  nonnormal 
j J j 

component  and  after  some  calculations  one  finds 

R R R 

A"  £ = 2((£  v,(l  - V,))2  - £ (Y,(l  - Y,))2)  • 

1 1 1 1 ’ 1 ] ] 

We  may  note  that  in  fact  the  exact  mean  and  variance  are 

ezn  = N - 1 

R , R , 

Var  Z = 2(N  - 1)  • (1  - X <P,0  " P,»  /<Z  ~ P,))  ) • 

1 ' ] 1 3 J 

By  Theorem  3 we  can  state 

i((ZN  - (N  - 1))/(2(N  - 1))1/2)  - N(0, 1 - K) 

where 

R R 

K = Z (Y,U  - Yt))V(Z  Y,(l  - Y,))2  • 

So  the  limit  distribution  has  smaller  variance  than  the  chi-square  approxima- 
tion indicates.  By  the  Cauchy-Schwarz  inequality  we  get  K > 1/R  with 
equality  if  and  only  if  Yi  = •••  = YD'  Unless  R is  big  and  the  y's 
are  not  too  unequal  this  approximation  is  likely  to  give  a conservative  test. 

Mantel  also  discusses  a \2(1)  approach.  This  is  actually  the 
same  as  using  the  normal  approximation  suggested  by  the  limit  law  above. 

As  Mantel  points  out,  the  distribution  of  is  right  skew  in  typical  cases. 

Thus  the  normal  approximation  may  be  inappropriate.  A right  skew 
distribution  having  the  right  mean,  variance  and  limit  law  is 
(1  - K)  • x2((N  - 1)/(1  - K)).  One  may  expect  that  this  distribution  better 
approximates  that  of  than  any  of  the  other  approximations. 
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